A hierarchical and scalable model for contemporary document image segmentation
Identifieur interne : 000209 ( Main/Exploration ); précédent : 000208; suivant : 000210A hierarchical and scalable model for contemporary document image segmentation
Auteurs : Asma Ouji [France] ; Yann Leydier [France] ; Frank Lebourgeois [France]Source :
- Pattern analysis and applications : (Print) [ 1433-7541 ] ; 2013.
Descripteurs français
- Pascal (Inist)
- Traitement document, Traitement image, Image couleur, Numérisation, Extensibilité, Perte information, Analyse documentaire, Reconnaissance image, Vision ordinateur, Texte, Reconnaissance optique caractère, Reconnaissance caractère, Structure document, Quantification signal, Publicité, Système hiérarchisé, Robustesse, Méthode adaptative, Modélisation, Image bruitée, ., Segmentation image.
- Wicri :
- topic : Numérisation, Publicité.
English descriptors
- KwdEn :
- Adaptive method, Advertising, Character recognition, Color image, Computer vision, Digitizing, Document analysis, Document processing, Document structure, Hierarchical system, Image processing, Image recognition, Image segmentation, Information loss, Modeling, Noisy image, Optical character recognition, Robustness, Scalability, Signal quantization, Text.
Abstract
In this paper, we introduce a novel color segmentation approach robust against digitization noise and adapted to contemporary document images. This system is scalable, hierarchical, versatile and completely automated, i.e. user independent. It proposes an adaptive binarization/ quantization without any penalizing information loss. This model may be used for many purposes. For instance, we rely on it to carry out the first steps leading to advertisement recognition in document images. Furthermore, the color segmentation output is used to localize text areas and enhance optical character recognition (OCR) performances. We held tests on a variety of magazine images to point up our contribution to the well-known OCR product Abby Finer-Reader. We also get promising results with our ad detection system on a large set of complex layout testing images.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000026
- to stream PascalFrancis, to step Curation: 000738
- to stream PascalFrancis, to step Checkpoint: 000052
- to stream Main, to step Merge: 000212
- to stream Main, to step Curation: 000209
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">A hierarchical and scalable model for contemporary document image segmentation</title>
<author><name sortKey="Ouji, Asma" sort="Ouji, Asma" uniqKey="Ouji A" first="Asma" last="Ouji">Asma Ouji</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Leydier, Yann" sort="Leydier, Yann" uniqKey="Leydier Y" first="Yann" last="Leydier">Yann Leydier</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">14-0075945</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 14-0075945 INIST</idno>
<idno type="RBID">Pascal:14-0075945</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000026</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000738</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000052</idno>
<idno type="wicri:doubleKey">1433-7541:2013:Ouji A:a:hierarchical:and</idno>
<idno type="wicri:Area/Main/Merge">000212</idno>
<idno type="wicri:Area/Main/Curation">000209</idno>
<idno type="wicri:Area/Main/Exploration">000209</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">A hierarchical and scalable model for contemporary document image segmentation</title>
<author><name sortKey="Ouji, Asma" sort="Ouji, Asma" uniqKey="Ouji A" first="Asma" last="Ouji">Asma Ouji</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Leydier, Yann" sort="Leydier, Yann" uniqKey="Leydier Y" first="Yann" last="Leydier">Yann Leydier</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, 20 av. Albert Einstein</s1>
<s2>Villeurbanne 69621</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Pattern analysis and applications : (Print)</title>
<title level="j" type="abbreviated">Pattern anal. appl. : (Print)</title>
<idno type="ISSN">1433-7541</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Pattern analysis and applications : (Print)</title>
<title level="j" type="abbreviated">Pattern anal. appl. : (Print)</title>
<idno type="ISSN">1433-7541</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Adaptive method</term>
<term>Advertising</term>
<term>Character recognition</term>
<term>Color image</term>
<term>Computer vision</term>
<term>Digitizing</term>
<term>Document analysis</term>
<term>Document processing</term>
<term>Document structure</term>
<term>Hierarchical system</term>
<term>Image processing</term>
<term>Image recognition</term>
<term>Image segmentation</term>
<term>Information loss</term>
<term>Modeling</term>
<term>Noisy image</term>
<term>Optical character recognition</term>
<term>Robustness</term>
<term>Scalability</term>
<term>Signal quantization</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Traitement document</term>
<term>Traitement image</term>
<term>Image couleur</term>
<term>Numérisation</term>
<term>Extensibilité</term>
<term>Perte information</term>
<term>Analyse documentaire</term>
<term>Reconnaissance image</term>
<term>Vision ordinateur</term>
<term>Texte</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Structure document</term>
<term>Quantification signal</term>
<term>Publicité</term>
<term>Système hiérarchisé</term>
<term>Robustesse</term>
<term>Méthode adaptative</term>
<term>Modélisation</term>
<term>Image bruitée</term>
<term>.</term>
<term>Segmentation image</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Numérisation</term>
<term>Publicité</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this paper, we introduce a novel color segmentation approach robust against digitization noise and adapted to contemporary document images. This system is scalable, hierarchical, versatile and completely automated, i.e. user independent. It proposes an adaptive binarization/ quantization without any penalizing information loss. This model may be used for many purposes. For instance, we rely on it to carry out the first steps leading to advertisement recognition in document images. Furthermore, the color segmentation output is used to localize text areas and enhance optical character recognition (OCR) performances. We held tests on a variety of magazine images to point up our contribution to the well-known OCR product Abby Finer-Reader. We also get promising results with our ad detection system on a large set of complex layout testing images.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Auvergne-Rhône-Alpes</li>
<li>Rhône-Alpes</li>
</region>
</list>
<tree><country name="France"><region name="Auvergne-Rhône-Alpes"><name sortKey="Ouji, Asma" sort="Ouji, Asma" uniqKey="Ouji A" first="Asma" last="Ouji">Asma Ouji</name>
</region>
<name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
<name sortKey="Leydier, Yann" sort="Leydier, Yann" uniqKey="Leydier Y" first="Yann" last="Leydier">Yann Leydier</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000209 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000209 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:14-0075945 |texte= A hierarchical and scalable model for contemporary document image segmentation }}
This area was generated with Dilib version V0.6.32. |